{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# A Bayesian Network to model the influence of energy consumption on greenhouse gases in Italy\n", "\n", "### by [Lorenzo Mario Amorosa](https://github.com/Lostefra)\n", "#### *Fundamentals of Artificial Intelligence and Knowledge Representation (Mod. 3) - Alma Mater Studiorum Università di Bologna*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Abstract\n", "\n", "Nowadays it is well established that **global warming** is hugely caused by greenhouse gases, which are indeed responsible for trapping heat in the atmosphere. The 3 most common gases are carbon dioxide (**CO**2), methane (**CH**4) and nitrous oxide (**N**2**O**) [[1]](#first).\n", "\n", "There are several sources of greenhouse gases, such as transportation, industry, commercial and residental. In this document I will tackle the problem in a general way, considering the impact of energy consumption on greenhouse gases emissions. Energy is indeed strictly related to almost all source factors. In particular, I will face the modelling of **causal relations** between **energy consumption and greenhouse gases in Italy** using a **Bayesian network**. The aim is to learn a model that can provide **probabilistic results given** some input **evidence**. The causal relations and their relative probabilities will be estimated by analyzing **annual growth factors** of several indicators from open source datasets of the World Bank Group (WBG) [[3]](#third). The choice of analyzing the annual growth aims to capture how the variation of an indicator can affect the others.\n", "\n", "This work starts from a paper by Cinar and Kayakutlu (2010) [[2]](#second) in which the authors produced estimates about energy investments in Turkey given historical data. Their work helped me to come up with interesting measures to be investigated and to be represented in the Bayesian network. I could extend their work adding: a more comprehensive analysis of network properties (**conditional independencies, active trails, Markov blankets**), concise and effective high level functions (such as `query_report`, `check_assertion` and `active_trails_of`) to express the most significant properties in a readable format and the whole code **to learn the Bayesian network** from the available datasets." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Network definition\n", "\n", "Generally we can expect that the increase of fossil fuel consumption determines the growth of greenhouse gases diffusion, whereas a wider use of renewable energies leads to a reduction of greenhouse gases emissions. In [[2]](#second) it is suggested that growth rate factors about population, urbanization and gross domestic product (GDP) can all influence the overall energy use in a nation. Given these assumptions, the Bayesian network is defined as follows: \n", "\n", "The terms in the nodes have the following labels in [[3]](#third):\n", "\n", "\n", "Pop = Population growth (annual %)\n", " \n", "Urb = Urban population growth (annual %)\n", "\n", "GDP = GDP per capita growth (annual %)\n", "\n", "EC = Energy use (kg of oil equivalent per capita) - [annual growth %]\n", "\n", "FFEC = Fossil fuel energy consumption (% of total) - [annual growth %]\n", "\n", "REC = Renewable energy consumption (% of total final energy consumption) - [annual growth %]\n", "\n", "EI = Energy imports, net (% of energy use) - [annual growth %]\n", "\n", "CO2 = CO2 emissions (metric tons per capita) - [annual growth %]\n", "\n", "CH4 = Methane emissions in energy sector (thousand metric tons of CO2 equivalent) - [annual growth %]\n", "\n", "N2O = Nitrous oxide emissions in energy sector (thousand metric tons of CO2 equivalent) - [annual growth %]\n", "\n", " \n", "The Bayesian network proposed differs from the one of [[2]](#second) because here greenhouse gases are kept distinct (CO2, CH4, N2O) and energy investments are not represented. \n", "\n", "In the following cell the Bayesian network is created using the pgmpy library." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from pgmpy.models import BayesianModel\n", "\n", "model = BayesianModel([('Pop', 'EC'), ('Urb', 'EC'), ('GDP', 'EC'),\n", " ('EC', 'FFEC'), ('EC', 'REC'), ('EC', 'EI'),\n", " ('REC', 'CO2'), ('REC', 'CH4'), ('REC', 'N2O'),\n", " ('FFEC', 'CO2'), ('FFEC', 'CH4'), ('FFEC', 'N2O')])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Datasets\n", "The **data** to compute the conditional probability tables (CPT) of the network are retrived from [[3]](#third). Some data are given there in absolute values and not in terms of **annual growth**, hence it is necessary to transform them properly (the labels of transformed data are marked with the trailing string \"*- [annual growth %]*\"). \n", "\n", "A good methodology to organize **historical data to infer CPT**, as shown also in [[2]](#second), is to **group** them **by year**, such that historical values of the same year are treated as a single tuple and thus they constitute a whole entry of the dataset. **Data discretization** is also needed, [since pgmpy does not support learning from continuous variables](https://github.com/pgmpy/pgmpy/issues/1084), and so all values are mapped into 3 fixed tiers. In order to minimize data sparsity, the boundaries of the mapping intervals are chosen such that each tier of the same node contains an equal number of samples.\n", "\n", "The pgmpy learning algorithms expect as input a dataframe in which each entry represents an *observed event* — a year in this case. As a consequence some **data preprocessing** is needed to make the raw data compatible with the required signature. It is worth noting that pgmpy handles learning task with **sparse datasets** (i.e. [it handles NaN values](https://pgmpy.org/estimators.html)). To compute indeed the CPT for a given node in the network there are used only those entries in which both the node itself and its parents are defined (i.e. an entry is used to compute a given CPT only if all needed values of that year are defined). This fact implies that the deeper the network is, the less sparse the dataset should be to compute significant CPT. \n", "\n", "As a side note, mind the fact that industrialization growth, which is expressed by indicators such as [industry value added](https://data.worldbank.org/indicator/NV.IND.TOTL.KD.ZG?locations=IT&view=chart), is not represented in the newtork. Data regarding industrialization are available only for a short year range in [[3]](#third) and the industrialization node would be on top of the network hierarchy, being a parent of the node 'EC'. As a consequence, if industrialization would be represented, then the overall learning process could rely only on a small subset of all the available data and the network parameters would be estimated less significantly.\n", "\n", "In the following cells data are imported and preprocessed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Raw data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 10 indicators in the dataframe.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Country NameCountry CodeIndicator NameIndicator Code196019611962196319641965...2010201120122013201420152016201720182019
0ItalyITAPopulation growth (annual %)SP.POP.GROW1.9939280.6683830.6766230.7295530.8226240.842109...0.3075910.1719780.2695411.1592510.917504-0.096376-0.169884-0.149861-0.190064NaN
1ItalyITAUrban population growth (annual %)SP.URB.GROW2.8364011.4988071.5068331.5512871.6360271.642485...0.4804390.3430660.6195791.5878351.3413710.3257010.2461270.2629990.228198NaN
2ItalyITAGDP per capita growth (annual %)NY.GDP.PCAP.KD.ZGNaN7.4864195.4874784.8420521.9555332.402046...1.4009150.534287-3.242060-2.972404-0.9178140.8754771.4518751.8687150.966058NaN
3ItalyITAEnergy use (kg of oil equivalent per capita) -...EG.USE.PCAP.KG.OENaN12.06220013.06405311.1886219.1100767.753922...2.113919-3.486796-4.211107-4.791839-6.3962122.786129NaNNaNNaNNaN
4ItalyITAFossil fuel energy consumption (% of total) - ...EG.USE.COMM.FO.ZSNaN2.3440181.933224-0.1677281.075163-0.074481...-0.262284-0.193760-2.679745-2.721392-1.7231581.733165NaNNaNNaNNaN
\n", "

5 rows × 64 columns

\n", "
" ], "text/plain": [ " Country Name Country Code \\\n", "0 Italy ITA \n", "1 Italy ITA \n", "2 Italy ITA \n", "3 Italy ITA \n", "4 Italy ITA \n", "\n", " Indicator Name Indicator Code \\\n", "0 Population growth (annual %) SP.POP.GROW \n", "1 Urban population growth (annual %) SP.URB.GROW \n", "2 GDP per capita growth (annual %) NY.GDP.PCAP.KD.ZG \n", "3 Energy use (kg of oil equivalent per capita) -... EG.USE.PCAP.KG.OE \n", "4 Fossil fuel energy consumption (% of total) - ... EG.USE.COMM.FO.ZS \n", "\n", " 1960 1961 1962 1963 1964 1965 ... \\\n", "0 1.993928 0.668383 0.676623 0.729553 0.822624 0.842109 ... \n", "1 2.836401 1.498807 1.506833 1.551287 1.636027 1.642485 ... \n", "2 NaN 7.486419 5.487478 4.842052 1.955533 2.402046 ... \n", "3 NaN 12.062200 13.064053 11.188621 9.110076 7.753922 ... \n", "4 NaN 2.344018 1.933224 -0.167728 1.075163 -0.074481 ... \n", "\n", " 2010 2011 2012 2013 2014 2015 2016 \\\n", "0 0.307591 0.171978 0.269541 1.159251 0.917504 -0.096376 -0.169884 \n", "1 0.480439 0.343066 0.619579 1.587835 1.341371 0.325701 0.246127 \n", "2 1.400915 0.534287 -3.242060 -2.972404 -0.917814 0.875477 1.451875 \n", "3 2.113919 -3.486796 -4.211107 -4.791839 -6.396212 2.786129 NaN \n", "4 -0.262284 -0.193760 -2.679745 -2.721392 -1.723158 1.733165 NaN \n", "\n", " 2017 2018 2019 \n", "0 -0.149861 -0.190064 NaN \n", "1 0.262999 0.228198 NaN \n", "2 1.868715 0.966058 NaN \n", "3 NaN NaN NaN \n", "4 NaN NaN NaN \n", "\n", "[5 rows x 64 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pandas import read_csv, DataFrame\n", "import numpy as np\n", "\n", "def annual_growth(row, years):\n", " min_year = years[\"min\"]\n", " max_year = years[\"max\"]\n", " row[\"Indicator Name\"] = row[\"Indicator Name\"] + \" - [annual growth %]\"\n", " for year in range(max_year, min_year, -1):\n", " if not np.isnan(row[str(year)]) and not np.isnan(row[str(year - 1)]):\n", " row[str(year)] = 100 * (float(row[str(year)]) - float(row[str(year - 1)])) / abs(float(row[str(year - 1)]))\n", " else:\n", " row[str(year)] = np.nan \n", " row[str(min_year)] = np.nan\n", " return row\n", "\n", "years = {\"min\" : 1960, \"max\" : 2019}\n", "df_raw = read_csv(\"../csv/italy-raw-data.csv\")\n", "df_raw_growth = DataFrame(data=[row if \"growth\" in row[\"Indicator Name\"] else annual_growth(row, years) for index, row in df_raw.iterrows()])\n", "print(\"There are \" + str(df_raw_growth.shape[0]) + \" indicators in the dataframe.\")\n", "df_raw_growth.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Data cleaning" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PopUrbGDPECFFECRECEICO2CH4N2O
19601.993932.8364NaNNaNNaNNaNNaNNaNNaNNaN
19610.6683831.498817.4864212.06222.34402NaN5.0726113.8924NaNNaN
19620.6766231.506835.4874813.06411.93322NaN5.7539517.5887NaNNaN
19630.7295531.551294.8420511.1886-0.167728NaN2.5194312.5116NaNNaN
19640.8226241.636031.955539.110081.07516NaN0.6310286.78298NaNNaN
19650.8421091.642482.402057.75392-0.0744814NaN2.335067.84845NaNNaN
19660.7773041.568115.164168.706030.552096NaN1.5184712.9005NaNNaN
19670.7237781.503616.405688.416381.19174NaN1.9389.41875NaNNaN
19680.6317371.403955.873599.468680.880322NaN1.596276.42744NaNNaN
19690.5660591.326095.499187.893910.940778NaN0.7256988.22188NaNNaN
\n", "
" ], "text/plain": [ " Pop Urb GDP EC FFEC REC EI CO2 \\\n", "1960 1.99393 2.8364 NaN NaN NaN NaN NaN NaN \n", "1961 0.668383 1.49881 7.48642 12.0622 2.34402 NaN 5.07261 13.8924 \n", "1962 0.676623 1.50683 5.48748 13.0641 1.93322 NaN 5.75395 17.5887 \n", "1963 0.729553 1.55129 4.84205 11.1886 -0.167728 NaN 2.51943 12.5116 \n", "1964 0.822624 1.63603 1.95553 9.11008 1.07516 NaN 0.631028 6.78298 \n", "1965 0.842109 1.64248 2.40205 7.75392 -0.0744814 NaN 2.33506 7.84845 \n", "1966 0.777304 1.56811 5.16416 8.70603 0.552096 NaN 1.51847 12.9005 \n", "1967 0.723778 1.50361 6.40568 8.41638 1.19174 NaN 1.938 9.41875 \n", "1968 0.631737 1.40395 5.87359 9.46868 0.880322 NaN 1.59627 6.42744 \n", "1969 0.566059 1.32609 5.49918 7.89391 0.940778 NaN 0.725698 8.22188 \n", "\n", " CH4 N2O \n", "1960 NaN NaN \n", "1961 NaN NaN \n", "1962 NaN NaN \n", "1963 NaN NaN \n", "1964 NaN NaN \n", "1965 NaN NaN \n", "1966 NaN NaN \n", "1967 NaN NaN \n", "1968 NaN NaN \n", "1969 NaN NaN " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nodes = ['Pop', 'Urb', 'GDP', 'EC', 'FFEC', 'REC', 'EI', 'CO2', 'CH4', 'N2O']\n", "df_growth = df_raw_growth.transpose().iloc[4:]\n", "df_growth.columns = nodes\n", "df_growth.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Data discretization" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PopUrbGDPECFFECRECEICO2CH4N2O
1960C: +0.50 to +1.99C: +0.82 to +2.84NaNNaNNaNNaNNaNNaNNaNNaN
1961C: +0.50 to +1.99C: +0.82 to +2.84C: +2.89 to +7.49C: +2.79 to +13.06C: +0.16 to +2.34NaNC: +0.78 to +5.75C: +3.81 to +17.59NaNNaN
1962C: +0.50 to +1.99C: +0.82 to +2.84C: +2.89 to +7.49C: +2.79 to +13.06C: +0.16 to +2.34NaNC: +0.78 to +5.75C: +3.81 to +17.59NaNNaN
1963C: +0.50 to +1.99C: +0.82 to +2.84C: +2.89 to +7.49C: +2.79 to +13.06B: -0.40 to +0.16NaNC: +0.78 to +5.75C: +3.81 to +17.59NaNNaN
1964C: +0.50 to +1.99C: +0.82 to +2.84B: +1.24 to +2.89C: +2.79 to +13.06C: +0.16 to +2.34NaNB: -0.38 to +0.78C: +3.81 to +17.59NaNNaN
1965C: +0.50 to +1.99C: +0.82 to +2.84B: +1.24 to +2.89C: +2.79 to +13.06B: -0.40 to +0.16NaNC: +0.78 to +5.75C: +3.81 to +17.59NaNNaN
1966C: +0.50 to +1.99C: +0.82 to +2.84C: +2.89 to +7.49C: +2.79 to +13.06C: +0.16 to +2.34NaNC: +0.78 to +5.75C: +3.81 to +17.59NaNNaN
1967C: +0.50 to +1.99C: +0.82 to +2.84C: +2.89 to +7.49C: +2.79 to +13.06C: +0.16 to +2.34NaNC: +0.78 to +5.75C: +3.81 to +17.59NaNNaN
1968C: +0.50 to +1.99C: +0.82 to +2.84C: +2.89 to +7.49C: +2.79 to +13.06C: +0.16 to +2.34NaNC: +0.78 to +5.75C: +3.81 to +17.59NaNNaN
1969C: +0.50 to +1.99C: +0.82 to +2.84C: +2.89 to +7.49C: +2.79 to +13.06C: +0.16 to +2.34NaNB: -0.38 to +0.78C: +3.81 to +17.59NaNNaN
\n", "
" ], "text/plain": [ " Pop Urb GDP \\\n", "1960 C: +0.50 to +1.99 C: +0.82 to +2.84 NaN \n", "1961 C: +0.50 to +1.99 C: +0.82 to +2.84 C: +2.89 to +7.49 \n", "1962 C: +0.50 to +1.99 C: +0.82 to +2.84 C: +2.89 to +7.49 \n", "1963 C: +0.50 to +1.99 C: +0.82 to +2.84 C: +2.89 to +7.49 \n", "1964 C: +0.50 to +1.99 C: +0.82 to +2.84 B: +1.24 to +2.89 \n", "1965 C: +0.50 to +1.99 C: +0.82 to +2.84 B: +1.24 to +2.89 \n", "1966 C: +0.50 to +1.99 C: +0.82 to +2.84 C: +2.89 to +7.49 \n", "1967 C: +0.50 to +1.99 C: +0.82 to +2.84 C: +2.89 to +7.49 \n", "1968 C: +0.50 to +1.99 C: +0.82 to +2.84 C: +2.89 to +7.49 \n", "1969 C: +0.50 to +1.99 C: +0.82 to +2.84 C: +2.89 to +7.49 \n", "\n", " EC FFEC REC EI \\\n", "1960 NaN NaN NaN NaN \n", "1961 C: +2.79 to +13.06 C: +0.16 to +2.34 NaN C: +0.78 to +5.75 \n", "1962 C: +2.79 to +13.06 C: +0.16 to +2.34 NaN C: +0.78 to +5.75 \n", "1963 C: +2.79 to +13.06 B: -0.40 to +0.16 NaN C: +0.78 to +5.75 \n", "1964 C: +2.79 to +13.06 C: +0.16 to +2.34 NaN B: -0.38 to +0.78 \n", "1965 C: +2.79 to +13.06 B: -0.40 to +0.16 NaN C: +0.78 to +5.75 \n", "1966 C: +2.79 to +13.06 C: +0.16 to +2.34 NaN C: +0.78 to +5.75 \n", "1967 C: +2.79 to +13.06 C: +0.16 to +2.34 NaN C: +0.78 to +5.75 \n", "1968 C: +2.79 to +13.06 C: +0.16 to +2.34 NaN C: +0.78 to +5.75 \n", "1969 C: +2.79 to +13.06 C: +0.16 to +2.34 NaN B: -0.38 to +0.78 \n", "\n", " CO2 CH4 N2O \n", "1960 NaN NaN NaN \n", "1961 C: +3.81 to +17.59 NaN NaN \n", "1962 C: +3.81 to +17.59 NaN NaN \n", "1963 C: +3.81 to +17.59 NaN NaN \n", "1964 C: +3.81 to +17.59 NaN NaN \n", "1965 C: +3.81 to +17.59 NaN NaN \n", "1966 C: +3.81 to +17.59 NaN NaN \n", "1967 C: +3.81 to +17.59 NaN NaN \n", "1968 C: +3.81 to +17.59 NaN NaN \n", "1969 C: +3.81 to +17.59 NaN NaN " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "TIERS_NUM = 3\n", "\n", "def boundary_str(start, end, tier):\n", " return f'{tier}: {start:+0,.2f} to {end:+0,.2f}'\n", "\n", "def relabel(v, boundaries):\n", " if v >= boundaries[0][0] and v <= boundaries[0][1]:\n", " return boundary_str(boundaries[0][0], boundaries[0][1], tier='A')\n", " elif v >= boundaries[1][0] and v <= boundaries[1][1]: \n", " return boundary_str(boundaries[1][0], boundaries[1][1], tier='B')\n", " elif v >= boundaries[2][0] and v <= boundaries[2][1]:\n", " return boundary_str(boundaries[2][0], boundaries[2][1], tier='C')\n", " else:\n", " return np.nan\n", " \n", "def get_boundaries(tiers):\n", " prev_tier = tiers[0]\n", " boundaries = [(prev_tier[0], prev_tier[prev_tier.shape[0] - 1])]\n", " for index, tier in enumerate(tiers):\n", " if index is not 0:\n", " boundaries.append((prev_tier[prev_tier.shape[0] - 1], tier[tier.shape[0] - 1]))\n", " prev_tier = tier\n", " return boundaries\n", " \n", "new_columns = {}\n", "for i, content in enumerate(df_growth.items()): \n", " (label, series) = content\n", " values = np.sort(np.array([x for x in series.tolist() if not np.isnan(x)] , dtype=float))\n", " if values.shape[0] < TIERS_NUM:\n", " print(f'Error: there are not enough data for label {label}')\n", " break\n", " boundaries = get_boundaries(tiers=np.array_split(values, TIERS_NUM)) \n", " new_columns[label] = [relabel(value, boundaries) for value in series.tolist()]\n", " \n", "df = DataFrame(data=new_columns)\n", "df.columns = nodes\n", "df.index = range(years[\"min\"], years[\"max\"] + 1)\n", "df.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Learning of network parameters\n", "In pgmpy it is possible to learn the CPT of a given Bayesian network using either a **Bayesian Estimator** or a **Maximum Likelihood Estimator** (MLE). The former exploits a known prior distribution of data, the latter does not make any particular assumption.\n", "\n", "**MLE can overfit** the data in case of **small datasets**, because there can be not enough observations and thus the observed frequencies can be not representative. Another problem with MLE is the fact that **state counts** are done **conditionally** for each **parents configuration** and this causes immense fragmentation since the state counts drop even more. The **Bayesian estimator** instead does not only rely on input data to learn the network parameters, but it also takes advantage of a **prior knowledge**, expressed through a prior distribution. In this way, the estimator does not have an absolute guide, but rather a reasonable **starting assumption** that allows to counterbalance the lack of data. \n", "\n", "Although the MLE approach seems plausible, it can be overly simplistic in many cases, whereas the Bayesian one is intrinsically more robust. As a consequence the **Bayesian estimator is choosen**. \n", "\n", "There are several prior distribuitions available in pgmpy, a sensible choice of **prior** is the **Bayesian Dirichlet equivalent uniform prior** (BDeu). In the learning process, using BDeu, **N uniform samples** are **generated** for each variable to compute the pseudo-counts (default is N=5), hence the estimated probabilities in CPT are more conservative than the ones obtained through MLE (i.e. probabilities close to 1 or 0 get smoothed).\n", "\n", "In the following cell the CPT are learned and displayed." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Check model: True\n", "\n", "CPT of Pop:\n", "+------------------------+----------+\n", "| Pop(A: -0.19 to +0.07) | 0.338164 |\n", "+------------------------+----------+\n", "| Pop(B: +0.07 to +0.50) | 0.338164 |\n", "+------------------------+----------+\n", "| Pop(C: +0.50 to +1.99) | 0.323671 |\n", "+------------------------+----------+ \n", "\n", "CPT of EC:\n", "+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+\n", "| GDP | GDP(A: -5.71 to +1.24) | GDP(A: -5.71 to +1.24) | GDP(A: -5.71 to +1.24) | GDP(A: -5.71 to +1.24) | GDP(A: -5.71 to +1.24) | GDP(A: -5.71 to +1.24) | GDP(A: -5.71 to +1.24) | GDP(A: -5.71 to +1.24) | GDP(A: -5.71 to +1.24) | GDP(B: +1.24 to +2.89) | GDP(B: +1.24 to +2.89) | GDP(B: +1.24 to +2.89) | GDP(B: +1.24 to +2.89) | GDP(B: +1.24 to +2.89) | GDP(B: +1.24 to +2.89) | GDP(B: +1.24 to +2.89) | GDP(B: +1.24 to +2.89) | GDP(B: +1.24 to +2.89) | GDP(C: +2.89 to +7.49) | GDP(C: +2.89 to +7.49) | GDP(C: +2.89 to +7.49) | GDP(C: +2.89 to +7.49) | GDP(C: +2.89 to +7.49) | GDP(C: +2.89 to +7.49) | GDP(C: +2.89 to +7.49) | GDP(C: +2.89 to +7.49) | GDP(C: +2.89 to +7.49) |\n", "+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+\n", "| Pop | Pop(A: -0.19 to +0.07) | Pop(A: -0.19 to +0.07) | Pop(A: -0.19 to +0.07) | Pop(B: +0.07 to +0.50) | Pop(B: +0.07 to +0.50) | Pop(B: +0.07 to +0.50) | Pop(C: +0.50 to +1.99) | Pop(C: +0.50 to +1.99) | Pop(C: +0.50 to +1.99) | Pop(A: -0.19 to +0.07) | Pop(A: -0.19 to +0.07) | Pop(A: -0.19 to +0.07) | Pop(B: +0.07 to +0.50) | Pop(B: +0.07 to +0.50) | Pop(B: +0.07 to +0.50) | Pop(C: +0.50 to +1.99) | Pop(C: +0.50 to +1.99) | Pop(C: +0.50 to +1.99) | Pop(A: -0.19 to +0.07) | Pop(A: -0.19 to +0.07) | Pop(A: -0.19 to +0.07) | Pop(B: +0.07 to +0.50) | Pop(B: +0.07 to +0.50) | Pop(B: +0.07 to +0.50) | Pop(C: +0.50 to +1.99) | Pop(C: +0.50 to +1.99) | Pop(C: +0.50 to +1.99) |\n", "+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+\n", "| Urb | Urb(A: -0.02 to +0.15) | Urb(B: +0.15 to +0.82) | Urb(C: +0.82 to +2.84) | Urb(A: -0.02 to +0.15) | Urb(B: +0.15 to +0.82) | Urb(C: +0.82 to +2.84) | Urb(A: -0.02 to +0.15) | Urb(B: +0.15 to +0.82) | Urb(C: +0.82 to +2.84) | Urb(A: -0.02 to +0.15) | Urb(B: +0.15 to +0.82) | Urb(C: +0.82 to +2.84) | Urb(A: -0.02 to +0.15) | Urb(B: +0.15 to +0.82) | Urb(C: +0.82 to +2.84) | Urb(A: -0.02 to +0.15) | Urb(B: +0.15 to +0.82) | Urb(C: +0.82 to +2.84) | Urb(A: -0.02 to +0.15) | Urb(B: +0.15 to +0.82) | Urb(C: +0.82 to +2.84) | Urb(A: -0.02 to +0.15) | Urb(B: +0.15 to +0.82) | Urb(C: +0.82 to +2.84) | Urb(A: -0.02 to +0.15) | Urb(B: +0.15 to +0.82) | Urb(C: +0.82 to +2.84) |\n", "+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+\n", "| EC(A: -7.05 to -0.12) | 0.9435028248587572 | 0.09009009009009006 | 0.3333333333333333 | 0.8198198198198197 | 0.6120943952802359 | 0.3333333333333333 | 0.3333333333333333 | 0.09009009009009006 | 0.9435028248587572 | 0.13421828908554573 | 0.3333333333333333 | 0.3333333333333333 | 0.05208333333333332 | 0.4858757062146893 | 0.8198198198198197 | 0.3333333333333333 | 0.3333333333333333 | 0.052083333333333336 | 0.02824858757062147 | 0.3333333333333333 | 0.3333333333333333 | 0.09009009009009009 | 0.3333333333333333 | 0.3333333333333333 | 0.3333333333333333 | 0.3333333333333333 | 0.010857763300760043 |\n", "+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+\n", "| EC(B: -0.12 to +2.79) | 0.028248587570621472 | 0.8198198198198197 | 0.3333333333333333 | 0.09009009009009006 | 0.2536873156342183 | 0.3333333333333333 | 0.3333333333333333 | 0.8198198198198197 | 0.028248587570621472 | 0.7315634218289085 | 0.3333333333333333 | 0.3333333333333333 | 0.8958333333333333 | 0.4858757062146893 | 0.09009009009009006 | 0.3333333333333333 | 0.3333333333333333 | 0.052083333333333336 | 0.7146892655367232 | 0.3333333333333333 | 0.3333333333333333 | 0.09009009009009009 | 0.036630036630036625 | 0.3333333333333333 | 0.3333333333333333 | 0.3333333333333333 | 0.09880564603691641 |\n", "+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+\n", "| EC(C: +2.79 to +13.06) | 0.028248587570621472 | 0.09009009009009006 | 0.3333333333333333 | 0.09009009009009006 | 0.13421828908554573 | 0.3333333333333333 | 0.3333333333333333 | 0.09009009009009006 | 0.028248587570621472 | 0.13421828908554573 | 0.3333333333333333 | 0.3333333333333333 | 0.05208333333333332 | 0.02824858757062147 | 0.09009009009009006 | 0.3333333333333333 | 0.3333333333333333 | 0.8958333333333335 | 0.2570621468926554 | 0.3333333333333333 | 0.3333333333333333 | 0.8198198198198199 | 0.63003663003663 | 0.3333333333333333 | 0.3333333333333333 | 0.3333333333333333 | 0.8903365906623236 |\n", "+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+ \n", "\n", "CPT of Urb:\n", "+------------------------+----------+\n", "| Urb(A: -0.02 to +0.15) | 0.338164 |\n", "+------------------------+----------+\n", "| Urb(B: +0.15 to +0.82) | 0.338164 |\n", "+------------------------+----------+\n", "| Urb(C: +0.82 to +2.84) | 0.323671 |\n", "+------------------------+----------+ \n", "\n", "CPT of GDP:\n", "+------------------------+----------+\n", "| GDP(A: -5.71 to +1.24) | 0.343137 |\n", "+------------------------+----------+\n", "| GDP(B: +1.24 to +2.89) | 0.328431 |\n", "+------------------------+----------+\n", "| GDP(C: +2.89 to +7.49) | 0.328431 |\n", "+------------------------+----------+ \n", "\n", "CPT of FFEC:\n", "+-------------------------+-----------------------+-----------------------+------------------------+\n", "| EC | EC(A: -7.05 to -0.12) | EC(B: -0.12 to +2.79) | EC(C: +2.79 to +13.06) |\n", "+-------------------------+-----------------------+-----------------------+------------------------+\n", "| FFEC(A: -2.72 to -0.40) | 0.5870646766169154 | 0.3333333333333333 | 0.09895833333333334 |\n", "+-------------------------+-----------------------+-----------------------+------------------------+\n", "| FFEC(B: -0.40 to +0.16) | 0.36318407960199006 | 0.4270833333333333 | 0.19270833333333331 |\n", "+-------------------------+-----------------------+-----------------------+------------------------+\n", "| FFEC(C: +0.16 to +2.34) | 0.04975124378109453 | 0.23958333333333331 | 0.7083333333333334 |\n", "+-------------------------+-----------------------+-----------------------+------------------------+ \n", "\n", "CPT of REC:\n", "+--------------------------+-----------------------+-----------------------+------------------------+\n", "| EC | EC(A: -7.05 to -0.12) | EC(B: -0.12 to +2.79) | EC(C: +2.79 to +13.06) |\n", "+--------------------------+-----------------------+-----------------------+------------------------+\n", "| REC(A: -14.97 to +2.03) | 0.2028985507246377 | 0.49612403100775193 | 0.3958333333333333 |\n", "+--------------------------+-----------------------+-----------------------+------------------------+\n", "| REC(B: +2.03 to +11.51) | 0.3333333333333333 | 0.2868217054263566 | 0.3958333333333333 |\n", "+--------------------------+-----------------------+-----------------------+------------------------+\n", "| REC(C: +11.51 to +23.95) | 0.463768115942029 | 0.2170542635658915 | 0.20833333333333331 |\n", "+--------------------------+-----------------------+-----------------------+------------------------+ \n", "\n", "CPT of EI:\n", "+-----------------------+-----------------------+-----------------------+------------------------+\n", "| EC | EC(A: -7.05 to -0.12) | EC(B: -0.12 to +2.79) | EC(C: +2.79 to +13.06) |\n", "+-----------------------+-----------------------+-----------------------+------------------------+\n", "| EI(A: -3.27 to -0.38) | 0.6766169154228856 | 0.2864583333333333 | 0.052083333333333336 |\n", "+-----------------------+-----------------------+-----------------------+------------------------+\n", "| EI(B: -0.38 to +0.78) | 0.2288557213930348 | 0.4739583333333333 | 0.2864583333333333 |\n", "+-----------------------+-----------------------+-----------------------+------------------------+\n", "| EI(C: +0.78 to +5.75) | 0.09452736318407962 | 0.23958333333333331 | 0.6614583333333334 |\n", "+-----------------------+-----------------------+-----------------------+------------------------+ \n", "\n", "CPT of CO2:\n", "+-------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+\n", "| FFEC | FFEC(A: -2.72 to -0.40) | FFEC(A: -2.72 to -0.40) | FFEC(A: -2.72 to -0.40) | FFEC(B: -0.40 to +0.16) | FFEC(B: -0.40 to +0.16) | FFEC(B: -0.40 to +0.16) | FFEC(C: +0.16 to +2.34) | FFEC(C: +0.16 to +2.34) | FFEC(C: +0.16 to +2.34) |\n", "+-------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+\n", "| REC | REC(A: -14.97 to +2.03) | REC(B: +2.03 to +11.51) | REC(C: +11.51 to +23.95) | REC(A: -14.97 to +2.03) | REC(B: +2.03 to +11.51) | REC(C: +11.51 to +23.95) | REC(A: -14.97 to +2.03) | REC(B: +2.03 to +11.51) | REC(C: +11.51 to +23.95) |\n", "+-------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+\n", "| CO2(A: -10.20 to -0.79) | 0.3333333333333333 | 0.463768115942029 | 0.6991869918699187 | 0.3333333333333333 | 0.463768115942029 | 0.3333333333333333 | 0.11904761904761905 | 0.3333333333333333 | 0.3333333333333333 |\n", "+-------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+\n", "| CO2(B: -0.79 to +3.81) | 0.3333333333333333 | 0.463768115942029 | 0.26016260162601623 | 0.6145833333333334 | 0.463768115942029 | 0.3333333333333333 | 0.44047619047619047 | 0.3333333333333333 | 0.3333333333333333 |\n", "+-------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+\n", "| CO2(C: +3.81 to +17.59) | 0.3333333333333333 | 0.07246376811594203 | 0.04065040650406504 | 0.052083333333333336 | 0.07246376811594203 | 0.3333333333333333 | 0.44047619047619047 | 0.3333333333333333 | 0.3333333333333333 |\n", "+-------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+ \n", "\n", "CPT of CH4:\n", "+------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+\n", "| FFEC | FFEC(A: -2.72 to -0.40) | FFEC(A: -2.72 to -0.40) | FFEC(A: -2.72 to -0.40) | FFEC(B: -0.40 to +0.16) | FFEC(B: -0.40 to +0.16) | FFEC(B: -0.40 to +0.16) | FFEC(C: +0.16 to +2.34) | FFEC(C: +0.16 to +2.34) | FFEC(C: +0.16 to +2.34) |\n", "+------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+\n", "| REC | REC(A: -14.97 to +2.03) | REC(B: +2.03 to +11.51) | REC(C: +11.51 to +23.95) | REC(A: -14.97 to +2.03) | REC(B: +2.03 to +11.51) | REC(C: +11.51 to +23.95) | REC(A: -14.97 to +2.03) | REC(B: +2.03 to +11.51) | REC(C: +11.51 to +23.95) |\n", "+------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+\n", "| CH4(A: -6.02 to -0.31) | 0.3333333333333333 | 0.5765765765765766 | 0.22424242424242424 | 0.855072463768116 | 0.6594202898550725 | 0.3333333333333333 | 0.761904761904762 | 0.3333333333333333 | 0.3333333333333333 |\n", "+------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+\n", "| CH4(B: -0.31 to +2.11) | 0.3333333333333333 | 0.3333333333333333 | 0.7151515151515152 | 0.07246376811594203 | 0.2681159420289855 | 0.3333333333333333 | 0.11904761904761905 | 0.3333333333333333 | 0.3333333333333333 |\n", "+------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+\n", "| CH4(C: +2.11 to +9.00) | 0.3333333333333333 | 0.0900900900900901 | 0.06060606060606061 | 0.07246376811594203 | 0.07246376811594203 | 0.3333333333333333 | 0.11904761904761905 | 0.3333333333333333 | 0.3333333333333333 |\n", "+------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+ \n", "\n", "CPT of N2O:\n", "+-------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+\n", "| FFEC | FFEC(A: -2.72 to -0.40) | FFEC(A: -2.72 to -0.40) | FFEC(A: -2.72 to -0.40) | FFEC(B: -0.40 to +0.16) | FFEC(B: -0.40 to +0.16) | FFEC(B: -0.40 to +0.16) | FFEC(C: +0.16 to +2.34) | FFEC(C: +0.16 to +2.34) | FFEC(C: +0.16 to +2.34) |\n", "+-------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+\n", "| REC | REC(A: -14.97 to +2.03) | REC(B: +2.03 to +11.51) | REC(C: +11.51 to +23.95) | REC(A: -14.97 to +2.03) | REC(B: +2.03 to +11.51) | REC(C: +11.51 to +23.95) | REC(A: -14.97 to +2.03) | REC(B: +2.03 to +11.51) | REC(C: +11.51 to +23.95) |\n", "+-------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+\n", "| N2O(A: -4.31 to +0.61) | 0.3333333333333333 | 0.5765765765765766 | 0.7151515151515152 | 0.463768115942029 | 0.07246376811594203 | 0.3333333333333333 | 0.11904761904761905 | 0.3333333333333333 | 0.3333333333333333 |\n", "+-------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+\n", "| N2O(B: +0.61 to +2.75) | 0.3333333333333333 | 0.3333333333333333 | 0.06060606060606061 | 0.2681159420289855 | 0.6594202898550725 | 0.3333333333333333 | 0.11904761904761905 | 0.3333333333333333 | 0.3333333333333333 |\n", "+-------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+\n", "| N2O(C: +2.75 to +12.61) | 0.3333333333333333 | 0.0900900900900901 | 0.22424242424242424 | 0.2681159420289855 | 0.2681159420289855 | 0.3333333333333333 | 0.761904761904762 | 0.3333333333333333 | 0.3333333333333333 |\n", "+-------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+-------------------------+-------------------------+--------------------------+ \n", "\n" ] } ], "source": [ "from pgmpy.estimators import BayesianEstimator, MaximumLikelihoodEstimator\n", "from IPython.core.display import display, HTML\n", "\n", "# disable text wrapping in output cell\n", "display(HTML(\"\"))\n", "\n", "model.cpds = []\n", "model.fit(data=df, \n", " estimator=BayesianEstimator,\n", " prior_type=\"BDeu\",\n", " equivalent_sample_size=10,\n", " complete_samples_only=False)\n", "\n", "print(f'Check model: {model.check_model()}\\n')\n", "for cpd in model.get_cpds():\n", " print(f'CPT of {cpd.variable}:')\n", " print(cpd, '\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Network analysis\n", "\n", "Among all the features, in pgmpy it is possible to investigate several **properties of the network**. For instance, it is possible to check for **conditional independencies** and **active trails** with respect to some given evidence or to ask for the **Markov blanket** of a node." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There can be made 2759 valid independence assertions with respect to the all possible given evidence.\n", "For instance, any node in the network is independent of its non-descendents given its parents (local semantics):\n", " \n", "(Pop _|_ GDP, Urb)\n", "(Urb _|_ Pop, GDP)\n", "(GDP _|_ Pop, Urb)\n", "(FFEC _|_ EI, Pop, REC, GDP, Urb | EC)\n", "(REC _|_ EI, Pop, FFEC, GDP, Urb | EC)\n", "(EI _|_ N2O, REC, GDP, Urb, CO2, Pop, FFEC, CH4 | EC)\n", "(CO2 _|_ N2O, EI, Pop, Urb, GDP, CH4, EC | FFEC, REC)\n", "(CH4 _|_ N2O, EI, Pop, Urb, GDP, CO2, EC | FFEC, REC)\n", "(N2O _|_ EI, Pop, Urb, GDP, CO2, CH4, EC | FFEC, REC)\n", "\n", "Active trails between 'FFEC' and {'N2O', 'REC', 'Urb', 'GDP', 'CO2', 'EI', 'Pop', 'CH4', 'EC'} given no evidence.\n", "Active trails between 'FFEC' and {'CO2', 'CH4', 'N2O'} given the evidence {'EC'}.\n", "Active trails between 'FFEC' and {'CH4', 'N2O', 'REC'} given the evidence {'CO2', 'EC'}.\n", "No active trails for 'EI' given the evidence {'EC'}.\n", "Active trails between 'CH4' and {'CO2', 'N2O', 'REC'} given the evidence {'FFEC', 'EC'}.\n", "\n", "Markov blanket of 'Pop' is {'Urb', 'GDP', 'EC'}\n", "Markov blanket of 'EC' is {'REC', 'GDP', 'Urb', 'EI', 'Pop', 'FFEC'}\n", "Markov blanket of 'REC' is {'N2O', 'CO2', 'FFEC', 'CH4', 'EC'}\n", "Markov blanket of 'CH4' is {'FFEC', 'REC'}\n" ] } ], "source": [ "print(f'There can be made {len(model.get_independencies().get_assertions())}',\n", " 'valid independence assertions with respect to the all possible given evidence.')\n", "print('For instance, any node in the network is independent of its non-descendents given its parents (local semantics):\\n',\n", " f'\\n{model.local_independencies(nodes)}\\n')\n", "\n", "def active_trails_of(query, evidence):\n", " active = model.active_trail_nodes(query, observed=evidence).get(query)\n", " active.remove(query)\n", " if active:\n", " if evidence:\n", " print(f'Active trails between \\'{query}\\' and {active} given the evidence {set(evidence)}.')\n", " else:\n", " print(f'Active trails between \\'{query}\\' and {active} given no evidence.')\n", " else:\n", " print(f'No active trails for \\'{query}\\' given the evidence {set(evidence)}.')\n", " \n", "def markov_blanket_of(node):\n", " print(f'Markov blanket of \\'{node}\\' is {set(model.get_markov_blanket(node))}')\n", "\n", "active_trails_of(query='FFEC', evidence=[])\n", "active_trails_of(query='FFEC', evidence=['EC'])\n", "active_trails_of(query='FFEC', evidence=['EC', 'CO2'])\n", "active_trails_of(query='EI', evidence=['EC'])\n", "active_trails_of(query='CH4', evidence=['EC', 'FFEC'])\n", "print()\n", "markov_blanket_of(node='Pop')\n", "markov_blanket_of(node='EC')\n", "markov_blanket_of(node='REC')\n", "markov_blanket_of(node='CH4') # note: a bug in pgmpy 0.1.10 raise a KeyError here. \n", " # I opened an issue and I got accepted my pull request with the bug fix: \n", " # https://github.com/pgmpy/pgmpy/issues/1293\n", " # https://github.com/pgmpy/pgmpy/pull/1294" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **independence assertions**, such as $(A \\perp B | C)$, are implemented in pgmpy with the class ```IndependenceAssertion```, which has 3 fields: \n", "\n", " - ```event1```: random variable which is independent ($A$ in the example). \n", " \n", " - ```event2```: random variables from which ```event1``` is independent ($B$ in the example). \n", " \n", " - ```event3```: random variables given which ```event1``` is independent of ```event2``` ($C$ in the example).\n", "\n", "In the following cell it is evaluated which are the nodes that have the maximum and the minimum number of appearance in independence assertions as independent variable or evidence. It can be notice that the closer a node is to the *core* of the network, the less are the independence assertions in which it is the independent variable and the more are the ones in which it is given as evidence." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nodes which appear most (311 times) in independence assertions as independent variable are:\n", "{'EI', 'Pop', 'GDP', 'Urb'}\n", "Nodes which appear least (112 times) in independence assertions as independent variable are:\n", "{'EC'}\n", "Nodes which appear most (2222 times) in independence assertions as evidence are:\n", "{'EC'}\n", "Nodes which appear least (1179 times) in independence assertions as evidence are:\n", "{'EI'}\n" ] } ], "source": [ "def independent_assertions_score_function(node):\n", " return len([a for a in model.get_independencies().get_assertions() if node in a.event1])\n", "\n", "def evidence_assertions_score_function(node):\n", " return len([a for a in model.get_independencies().get_assertions() if node in a.event3])\n", "\n", "def update(assertion_dict, node, score_function):\n", " tmp_score = score_function(node)\n", " if tmp_score == assertion_dict[\"max\"][\"score\"]:\n", " assertion_dict[\"max\"][\"nodes\"].append(node)\n", " elif tmp_score > assertion_dict[\"max\"][\"score\"]:\n", " assertion_dict[\"max\"][\"nodes\"] = [node]\n", " assertion_dict[\"max\"][\"score\"] = tmp_score\n", " if tmp_score == assertion_dict[\"min\"][\"score\"]:\n", " assertion_dict[\"min\"][\"nodes\"].append(node)\n", " elif tmp_score < assertion_dict[\"min\"][\"score\"]:\n", " assertion_dict[\"min\"][\"nodes\"] = [node]\n", " assertion_dict[\"min\"][\"score\"] = tmp_score \n", "\n", "if len(nodes) > 1:\n", " independent_init = independent_assertions_score_function(nodes[0])\n", " independent_dict = {\"max\": {\"nodes\": [nodes[0]], \"score\": independent_init}, \n", " \"min\": {\"nodes\": [nodes[0]], \"score\": independent_init}}\n", " evidence_init = evidence_assertions_score_function(nodes[0]) \n", " evidence_dict = {\"max\": {\"nodes\": [nodes[0]], \"score\": evidence_init}, \n", " \"min\": {\"nodes\": [nodes[0]], \"score\": evidence_init}} \n", " for node in nodes[1:]:\n", " update(independent_dict, node, independent_assertions_score_function)\n", " update(evidence_dict, node, evidence_assertions_score_function)\n", "\n", "print(f'Nodes which appear most ({independent_dict[\"max\"][\"score\"]} times) in independence assertions',\n", " f'as independent variable are:\\n{set(independent_dict[\"max\"][\"nodes\"])}')\n", "print(f'Nodes which appear least ({independent_dict[\"min\"][\"score\"]} times) in independence assertions',\n", " f'as independent variable are:\\n{set(independent_dict[\"min\"][\"nodes\"])}')\n", "print(f'Nodes which appear most ({evidence_dict[\"max\"][\"score\"]} times) in independence assertions',\n", " f'as evidence are:\\n{set(evidence_dict[\"max\"][\"nodes\"])}')\n", "print(f'Nodes which appear least ({evidence_dict[\"min\"][\"score\"]} times) in independence assertions',\n", " f'as evidence are:\\n{set(evidence_dict[\"min\"][\"nodes\"])}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some queries on independence assertions are proposed." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(EI _|_ N2O | EC): True\n", "(EI _|_ REC, Urb, GDP, Pop, FFEC | EC): True\n", "(EC _|_ CH4 | FFEC): False\n", "(EC _|_ CH4 | FFEC, REC): True\n", "(FFEC _|_ REC | EC): True\n", "(FFEC _|_ REC | CO2, EC): False\n", "(CH4 _|_ CO2 | FFEC): False\n", "(CH4 _|_ CO2 | FFEC, REC): True\n" ] } ], "source": [ "from pgmpy.independencies.Independencies import IndependenceAssertion\n", "\n", "def check_assertion(model, independent, from_variables, evidence):\n", " assertion = IndependenceAssertion(independent, from_variables, evidence)\n", " result = False\n", " for a in model.get_independencies().get_assertions():\n", " if frozenset(assertion.event1) == a.event1 and assertion.event2 <= a.event2 and frozenset(assertion.event3) == a.event3:\n", " result = True\n", " break\n", " print(f'{assertion}: {result}')\n", "\n", "check_assertion(model, independent=['EI'], from_variables=['N2O'], evidence=['EC'])\n", "check_assertion(model, independent=['EI'], from_variables=[\"FFEC\", \"REC\", \"GDP\", \"Pop\", \"Urb\"], evidence=['EC'])\n", "check_assertion(model, independent=['EC'], from_variables=[\"CH4\"], evidence=['FFEC'])\n", "check_assertion(model, independent=['EC'], from_variables=[\"CH4\"], evidence=['FFEC', 'REC'])\n", "check_assertion(model, independent=['FFEC'], from_variables=[\"REC\"], evidence=['EC'])\n", "check_assertion(model, independent=['FFEC'], from_variables=[\"REC\"], evidence=['EC', 'CO2'])\n", "check_assertion(model, independent=['CH4'], from_variables=[\"CO2\"], evidence=['FFEC'])\n", "check_assertion(model, independent=['CH4'], from_variables=[\"CO2\"], evidence=['FFEC', 'REC'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inferences\n", "In pgmpy it is possible to perform **inference** on a Bayesian network through the **Variable Elimination method**. The **elimination order** is evaluated through **heuristic functions**, which assign an elimination cost to each node that has to be removed (i.e. to each node apart from the query nodes and the evidence nodes). The variable elimination algorithm is then executed accordingly with the assigned node costs.\n", "\n", "The [available heuristic functions](https://pgmpy.org/inference.html#module-pgmpy.inference.EliminationOrder) assign the cost of a node as:\n", "\n", "- `MinFill`: the number of edges that need to be added to the graph due to its elimination.\n", "\n", "- `MinNeighbors`: the number of neighbors it has in the current graph.\n", "\n", "- `MinWeight`: the product of weights, domain cardinality, of its neighbors.\n", "\n", "- `WeightedMinFill`: the sum of weights of the edges that need to be added to the graph due to its elimination, where a weight of an edge is the product of the weights, domain cardinality, of its constituent vertices.\n", "\n", "The following cell is dedicated to inference queries and experiments on the cost heuristic functions. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Elimination order for ['GDP'] with no evidence computed through MinNeighbors heuristic:\n", "['EI', 'N2O', 'CO2', 'CH4', 'REC', 'FFEC', 'Urb', 'Pop', 'EC']\n", "--- Ordering found in 0.0022 seconds ---\n", "\n", "Probability query of ['GDP'] with no evidence through precomputed elimination order:\n", "+------------------------+------------+\n", "| GDP | phi(GDP) |\n", "+========================+============+\n", "| GDP(A: -5.71 to +1.24) | 0.3431 |\n", "+------------------------+------------+\n", "| GDP(B: +1.24 to +2.89) | 0.3284 |\n", "+------------------------+------------+\n", "| GDP(C: +2.89 to +7.49) | 0.3284 |\n", "+------------------------+------------+\n", "--- Query executed in 0.0196 seconds ---\n", "\n", "Probability query of ['GDP'] with no evidence through dummy elimination order:\n", "+------------------------+------------+\n", "| GDP | phi(GDP) |\n", "+========================+============+\n", "| GDP(A: -5.71 to +1.24) | 0.3431 |\n", "+------------------------+------------+\n", "| GDP(B: +1.24 to +2.89) | 0.3284 |\n", "+------------------------+------------+\n", "| GDP(C: +2.89 to +7.49) | 0.3284 |\n", "+------------------------+------------+\n", "--- Query executed in 0.1300 seconds ---\n", "\n", "Different elimination order found for probability query of ['GDP'] with no evidence:\n", "MinFill: ['N2O', 'Urb', 'CO2', 'REC', 'EI', 'Pop', 'FFEC', 'CH4', 'EC']\n", "MinNeighbors: ['EI', 'N2O', 'CO2', 'CH4', 'REC', 'FFEC', 'Urb', 'Pop', 'EC']\n", "MinWeight: ['EI', 'N2O', 'CO2', 'CH4', 'REC', 'FFEC', 'Urb', 'Pop', 'EC']\n", "WeightedMinFill: ['EI', 'N2O', 'CO2', 'CH4', 'REC', 'FFEC', 'Urb', 'Pop', 'EC']\n", "\n", "Probability query of ['CO2'] with evidence {'EC': 'A: -7.05 to -0.12'} computed through MinFill heuristic:\n", "+-------------------------+------------+\n", "| CO2 | phi(CO2) |\n", "+=========================+============+\n", "| CO2(A: -10.20 to -0.79) | 0.4721 |\n", "+-------------------------+------------+\n", "| CO2(B: -0.79 to +3.81) | 0.3765 |\n", "+-------------------------+------------+\n", "| CO2(C: +3.81 to +17.59) | 0.1514 |\n", "+-------------------------+------------+\n", "--- Query executed in 0.0237 seconds ---\n", "\n", "Different elimination order found for probability query of ['CO2'] with evidence {'EC': 'A: -7.05 to -0.12'}:\n", "MinFill: ['N2O', 'GDP', 'Urb', 'EI', 'Pop', 'CH4', 'REC', 'FFEC']\n", "MinNeighbors: ['EI', 'N2O', 'CH4', 'REC', 'FFEC', 'GDP', 'Urb', 'Pop']\n", "MinWeight: ['EI', 'N2O', 'CH4', 'REC', 'FFEC', 'GDP', 'Urb', 'Pop']\n", "WeightedMinFill: ['EI', 'N2O', 'CH4', 'REC', 'FFEC', 'GDP', 'Urb', 'Pop']\n", "\n", "Probability query of ['CO2'] with evidence {'EC': 'A: -7.05 to -0.12'} computed through MinNeighbors heuristic:\n", "+-------------------------+------------+\n", "| CO2 | phi(CO2) |\n", "+=========================+============+\n", "| CO2(A: -10.20 to -0.79) | 0.4721 |\n", "+-------------------------+------------+\n", "| CO2(B: -0.79 to +3.81) | 0.3765 |\n", "+-------------------------+------------+\n", "| CO2(C: +3.81 to +17.59) | 0.1514 |\n", "+-------------------------+------------+\n", "--- Query executed in 0.0302 seconds ---\n", "\n" ] } ], "source": [ "from pgmpy.inference import VariableElimination\n", "import time\n", "\n", "def query_report(infer, variables, evidence=None, elimination_order=\"MinFill\", show_progress=False, desc=\"\"):\n", " if desc:\n", " print(desc)\n", " start_time = time.time()\n", " print(infer.query(variables=variables, \n", " evidence=evidence, \n", " elimination_order=elimination_order, \n", " show_progress=show_progress))\n", " print(f'--- Query executed in {time.time() - start_time:0,.4f} seconds ---\\n')\n", " \n", "def get_ordering(infer, variables, evidence=None, elimination_order=\"MinFill\", show_progress=False, desc=\"\"):\n", " start_time = time.time()\n", " ordering = infer._get_elimination_order(variables=variables, \n", " evidence=evidence, \n", " elimination_order=elimination_order, \n", " show_progress=show_progress)\n", " if desc:\n", " print(desc, ordering, sep='\\n')\n", " print(f'--- Ordering found in {time.time() - start_time:0,.4f} seconds ---\\n')\n", " return ordering\n", "\n", "def padding(heuristic):\n", " return (heuristic + \":\").ljust(16)\n", "\n", "def compare_all_ordering(infer, variables, evidence=None, show_progress=False):\n", " ord_dict = {\n", " \"MinFill\": get_ordering(infer, variables, evidence, \"MinFill\", show_progress),\n", " \"MinNeighbors\": get_ordering(infer, variables, evidence, \"MinNeighbors\", show_progress),\n", " \"MinWeight\": get_ordering(infer, variables, evidence, \"MinWeight\", show_progress),\n", " \"WeightedMinFill\": get_ordering(infer, variables, evidence, \"WeightedMinFill\", show_progress)\n", " }\n", " if not evidence:\n", " pre = f'elimination order found for probability query of {variables} with no evidence:'\n", " else:\n", " pre = f'elimination order found for probability query of {variables} with evidence {evidence}:'\n", " if ord_dict[\"MinFill\"] == ord_dict[\"MinNeighbors\"] and ord_dict[\"MinFill\"] == ord_dict[\"MinWeight\"] and ord_dict[\"MinFill\"] == ord_dict[\"WeightedMinFill\"]:\n", " print(f'All heuristics find the same {pre}.\\n{ord_dict[\"MinFill\"]}\\n')\n", " else:\n", " print(f'Different {pre}')\n", " for heuristic, order in ord_dict.items():\n", " print(f'{padding(heuristic)} {order}')\n", " print()\n", " \n", "infer = VariableElimination(model)\n", "\n", "var = ['GDP']\n", "heuristic = \"MinNeighbors\"\n", "ordering = get_ordering(infer, variables=var, elimination_order=heuristic,\n", " desc=f'Elimination order for {var} with no evidence computed through {heuristic} heuristic:')\n", "query_report(infer, variables=var, elimination_order=ordering, \n", " desc=f'Probability query of {var} with no evidence through precomputed elimination order:') \n", "query_report(infer, variables=var, elimination_order=list(reversed(ordering)), \n", " desc=f'Probability query of {var} with no evidence through dummy elimination order:')\n", "compare_all_ordering(infer, variables=var)\n", "\n", "var = ['CO2']\n", "ev = {'EC': 'A: -7.05 to -0.12'}\n", "heuristic = \"MinFill\"\n", "query_report(infer, variables=var, evidence=ev, elimination_order=heuristic, \n", " desc=f'Probability query of {var} with evidence {ev} computed through {heuristic} heuristic:')\n", "compare_all_ordering(infer, variables=var, evidence=ev)\n", "heuristic = \"MinNeighbors\"\n", "query_report(infer, variables=var, evidence=ev, elimination_order=heuristic, \n", " desc=f'Probability query of {var} with evidence {ev} computed through {heuristic} heuristic:') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This last part of this document try to answer to some **key questions** using probability queries about the greenhouse gases scenario:\n", "\n", "- a considerable population growth lead to an increase of energy consumption and CO2 diffusion\n", "- an increase of urbanization growth rate and GDP per capita are important factors in CO2 diffusion\n", "- energy consumption reduction lead to CO2 diffusion reduction\n", "- an increase of CO2 diffusion is evidence of an increase of CH4 and N2O diffusion\n", "- a reduction of CO2 diffusion is evidence of a reduction of CH4 and N2O diffusion\n", "- fossil fuel consumption reduction and renewable energy consumption increase impact significantly on greenhouse gases diffusion." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Probability of energy consumption increase with population growth over +0.50%:\n", "+------------------------+-----------+\n", "| EC | phi(EC) |\n", "+========================+===========+\n", "| EC(A: -7.05 to -0.12) | 0.3087 |\n", "+------------------------+-----------+\n", "| EC(B: -0.12 to +2.79) | 0.3011 |\n", "+------------------------+-----------+\n", "| EC(C: +2.79 to +13.06) | 0.3902 |\n", "+------------------------+-----------+\n", "--- Query executed in 0.0331 seconds ---\n", "\n", "Probability of CO2 increase with population growth over +0.50%:\n", "+-------------------------+------------+\n", "| CO2 | phi(CO2) |\n", "+=========================+============+\n", "| CO2(A: -10.20 to -0.79) | 0.3704 |\n", "+-------------------------+------------+\n", "| CO2(B: -0.79 to +3.81) | 0.4008 |\n", "+-------------------------+------------+\n", "| CO2(C: +3.81 to +17.59) | 0.2288 |\n", "+-------------------------+------------+\n", "--- Query executed in 0.0276 seconds ---\n", "\n", "Probability of CO2 increase with urbanization growth over +0.82%:\n", "+-------------------------+------------+\n", "| CO2 | phi(CO2) |\n", "+=========================+============+\n", "| CO2(A: -10.20 to -0.79) | 0.3793 |\n", "+-------------------------+------------+\n", "| CO2(B: -0.79 to +3.81) | 0.3965 |\n", "+-------------------------+------------+\n", "| CO2(C: +3.81 to +17.59) | 0.2242 |\n", "+-------------------------+------------+\n", "--- Query executed in 0.0253 seconds ---\n", "\n", "Probability of CO2 increase with GDP per capita growth over +2.89%:\n", "+-------------------------+------------+\n", "| CO2 | phi(CO2) |\n", "+=========================+============+\n", "| CO2(A: -10.20 to -0.79) | 0.3570 |\n", "+-------------------------+------------+\n", "| CO2(B: -0.79 to +3.81) | 0.4021 |\n", "+-------------------------+------------+\n", "| CO2(C: +3.81 to +17.59) | 0.2409 |\n", "+-------------------------+------------+\n", "--- Query executed in 0.0234 seconds ---\n", "\n", "Probability of CO2 increase with:\n", " - population growth over +0.50%\n", " - urbanization growth over +0.82%\n", " - GDP per capita growth over +2.89%:\n", "+-------------------------+------------+\n", "| CO2 | phi(CO2) |\n", "+=========================+============+\n", "| CO2(A: -10.20 to -0.79) | 0.3044 |\n", "+-------------------------+------------+\n", "| CO2(B: -0.79 to +3.81) | 0.4011 |\n", "+-------------------------+------------+\n", "| CO2(C: +3.81 to +17.59) | 0.2945 |\n", "+-------------------------+------------+\n", "--- Query executed in 0.0251 seconds ---\n", "\n", "Probability of CO2 increase with energy consumption growth under -0.12%:\n", "+-------------------------+------------+\n", "| CO2 | phi(CO2) |\n", "+=========================+============+\n", "| CO2(A: -10.20 to -0.79) | 0.4721 |\n", "+-------------------------+------------+\n", "| CO2(B: -0.79 to +3.81) | 0.3765 |\n", "+-------------------------+------------+\n", "| CO2(C: +3.81 to +17.59) | 0.1514 |\n", "+-------------------------+------------+\n", "--- Query executed in 0.0320 seconds ---\n", "\n", "Probability of CO2 increase with energy consumption growth over +2.79%:\n", "+-------------------------+------------+\n", "| CO2 | phi(CO2) |\n", "+=========================+============+\n", "| CO2(A: -10.20 to -0.79) | 0.2959 |\n", "+-------------------------+------------+\n", "| CO2(B: -0.79 to +3.81) | 0.3984 |\n", "+-------------------------+------------+\n", "| CO2(C: +3.81 to +17.59) | 0.3058 |\n", "+-------------------------+------------+\n", "--- Query executed in 0.0295 seconds ---\n", "\n", "Probability of CH4 increase with CO2 growth over +3.81%:\n", "+------------------------+------------+\n", "| CH4 | phi(CH4) |\n", "+========================+============+\n", "| CH4(A: -6.02 to -0.31) | 0.4789 |\n", "+------------------------+------------+\n", "| CH4(B: -0.31 to +2.11) | 0.2765 |\n", "+------------------------+------------+\n", "| CH4(C: +2.11 to +9.00) | 0.2446 |\n", "+------------------------+------------+\n", "--- Query executed in 0.0279 seconds ---\n", "\n", "Probability of CH4 increase with CO2 growth under -0.79%:\n", "+------------------------+------------+\n", "| CH4 | phi(CH4) |\n", "+========================+============+\n", "| CH4(A: -6.02 to -0.31) | 0.4573 |\n", "+------------------------+------------+\n", "| CH4(B: -0.31 to +2.11) | 0.3776 |\n", "+------------------------+------------+\n", "| CH4(C: +2.11 to +9.00) | 0.1650 |\n", "+------------------------+------------+\n", "--- Query executed in 0.0219 seconds ---\n", "\n", "Probability of CH4 increase with no given evidence:\n", "+------------------------+------------+\n", "| CH4 | phi(CH4) |\n", "+========================+============+\n", "| CH4(A: -6.02 to -0.31) | 0.5014 |\n", "+------------------------+------------+\n", "| CH4(B: -0.31 to +2.11) | 0.3155 |\n", "+------------------------+------------+\n", "| CH4(C: +2.11 to +9.00) | 0.1831 |\n", "+------------------------+------------+\n", "--- Query executed in 0.0198 seconds ---\n", "\n", "Probability of N2O increase with CO2 growth over +3.81%:\n", "+-------------------------+------------+\n", "| N2O | phi(N2O) |\n", "+=========================+============+\n", "| N2O(A: -4.31 to +0.61) | 0.2901 |\n", "+-------------------------+------------+\n", "| N2O(B: +0.61 to +2.75) | 0.2807 |\n", "+-------------------------+------------+\n", "| N2O(C: +2.75 to +12.61) | 0.4292 |\n", "+-------------------------+------------+\n", "--- Query executed in 0.0220 seconds ---\n", "\n", "Probability of N2O increase with CO2 growth under -0.79%:\n", "+-------------------------+------------+\n", "| N2O | phi(N2O) |\n", "+=========================+============+\n", "| N2O(A: -4.31 to +0.61) | 0.4274 |\n", "+-------------------------+------------+\n", "| N2O(B: +0.61 to +2.75) | 0.2968 |\n", "+-------------------------+------------+\n", "| N2O(C: +2.75 to +12.61) | 0.2758 |\n", "+-------------------------+------------+\n", "--- Query executed in 0.0223 seconds ---\n", "\n", "Probability of N2O increase with no given evidence:\n", "+-------------------------+------------+\n", "| N2O | phi(N2O) |\n", "+=========================+============+\n", "| N2O(A: -4.31 to +0.61) | 0.3699 |\n", "+-------------------------+------------+\n", "| N2O(B: +0.61 to +2.75) | 0.2982 |\n", "+-------------------------+------------+\n", "| N2O(C: +2.75 to +12.61) | 0.3319 |\n", "+-------------------------+------------+\n", "--- Query executed in 0.0209 seconds ---\n", "\n", "Joint probability of fossil fuel energy consumption and renewable energy consumption:\n", "+--------------------------+-------------------------+-----------------+\n", "| REC | FFEC | phi(REC,FFEC) |\n", "+==========================+=========================+=================+\n", "| REC(A: -14.97 to +2.03) | FFEC(A: -2.72 to -0.40) | 0.1110 |\n", "+--------------------------+-------------------------+-----------------+\n", "| REC(A: -14.97 to +2.03) | FFEC(B: -0.40 to +0.16) | 0.1221 |\n", "+--------------------------+-------------------------+-----------------+\n", "| REC(A: -14.97 to +2.03) | FFEC(C: +0.16 to +2.34) | 0.1290 |\n", "+--------------------------+-------------------------+-----------------+\n", "| REC(B: +2.03 to +11.51) | FFEC(A: -2.72 to -0.40) | 0.1141 |\n", "+--------------------------+-------------------------+-----------------+\n", "| REC(B: +2.03 to +11.51) | FFEC(B: -0.40 to +0.16) | 0.1081 |\n", "+--------------------------+-------------------------+-----------------+\n", "| REC(B: +2.03 to +11.51) | FFEC(C: +0.16 to +2.34) | 0.1140 |\n", "+--------------------------+-------------------------+-----------------+\n", "| REC(C: +11.51 to +23.95) | FFEC(A: -2.72 to -0.40) | 0.1276 |\n", "+--------------------------+-------------------------+-----------------+\n", "| REC(C: +11.51 to +23.95) | FFEC(B: -0.40 to +0.16) | 0.1037 |\n", "+--------------------------+-------------------------+-----------------+\n", "| REC(C: +11.51 to +23.95) | FFEC(C: +0.16 to +2.34) | 0.0705 |\n", "+--------------------------+-------------------------+-----------------+\n", "--- Query executed in 0.0218 seconds ---\n", "\n", "Probability of CO2 increase with:\n", " - fossil fuel energy consumption growth under -0.40%\n", " - renewable energy consumption growth over +11.51%\n", "+-------------------------+------------+\n", "| CO2 | phi(CO2) |\n", "+=========================+============+\n", "| CO2(A: -10.20 to -0.79) | 0.6992 |\n", "+-------------------------+------------+\n", "| CO2(B: -0.79 to +3.81) | 0.2602 |\n", "+-------------------------+------------+\n", "| CO2(C: +3.81 to +17.59) | 0.0407 |\n", "+-------------------------+------------+\n", "--- Query executed in 0.0250 seconds ---\n", "\n", "Probability of CH4 increase with:\n", " - fossil fuel energy consumption growth under -0.40%\n", " - renewable energy consumption growth over +11.51%\n", "+------------------------+------------+\n", "| CH4 | phi(CH4) |\n", "+========================+============+\n", "| CH4(A: -6.02 to -0.31) | 0.2242 |\n", "+------------------------+------------+\n", "| CH4(B: -0.31 to +2.11) | 0.7152 |\n", "+------------------------+------------+\n", "| CH4(C: +2.11 to +9.00) | 0.0606 |\n", "+------------------------+------------+\n", "--- Query executed in 0.0239 seconds ---\n", "\n", "Probability of N2O increase with:\n", " - fossil fuel energy consumption growth under -0.40%\n", " - renewable energy consumption growth over +11.51%\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "+-------------------------+------------+\n", "| N2O | phi(N2O) |\n", "+=========================+============+\n", "| N2O(A: -4.31 to +0.61) | 0.7152 |\n", "+-------------------------+------------+\n", "| N2O(B: +0.61 to +2.75) | 0.0606 |\n", "+-------------------------+------------+\n", "| N2O(C: +2.75 to +12.61) | 0.2242 |\n", "+-------------------------+------------+\n", "--- Query executed in 0.0273 seconds ---\n", "\n" ] } ], "source": [ "query_report(infer, variables=['EC'], evidence={'Pop': 'C: +0.50 to +1.99'},\n", " desc='Probability of energy consumption increase with population growth over +0.50%:')\n", "query_report(infer, variables=['CO2'], evidence={'Pop': 'C: +0.50 to +1.99'},\n", " desc='Probability of CO2 increase with population growth over +0.50%:')\n", "query_report(infer, variables=['CO2'], evidence={'Urb': 'C: +0.82 to +2.84'}, \n", " desc='Probability of CO2 increase with urbanization growth over +0.82%:')\n", "query_report(infer, variables=['CO2'], evidence={'GDP': 'C: +2.89 to +7.49'}, \n", " desc='Probability of CO2 increase with GDP per capita growth over +2.89%:')\n", "query_report(infer, variables=['CO2'], \n", " evidence={'Pop': 'C: +0.50 to +1.99', 'Urb': 'C: +0.82 to +2.84', 'GDP': 'C: +2.89 to +7.49'},\n", " desc='Probability of CO2 increase with:\\n - population growth over +0.50%\\n - urbanization growth over +0.82%\\n - GDP per capita growth over +2.89%:')\n", "query_report(infer, variables=['CO2'], evidence={'EC': 'A: -7.05 to -0.12'},\n", " desc='Probability of CO2 increase with energy consumption growth under -0.12%:')\n", "query_report(infer, variables=['CO2'], evidence={'EC': 'C: +2.79 to +13.06'},\n", " desc='Probability of CO2 increase with energy consumption growth over +2.79%:')\n", "query_report(infer, variables=['CH4'], evidence={'CO2': 'C: +3.81 to +17.59'},\n", " desc='Probability of CH4 increase with CO2 growth over +3.81%:')\n", "query_report(infer, variables=['CH4'], evidence={'CO2': 'A: -10.20 to -0.79'},\n", " desc='Probability of CH4 increase with CO2 growth under -0.79%:')\n", "query_report(infer, variables=['CH4'],\n", " desc='Probability of CH4 increase with no given evidence:')\n", "query_report(infer, variables=['N2O'], evidence={'CO2': 'C: +3.81 to +17.59'},\n", " desc='Probability of N2O increase with CO2 growth over +3.81%:')\n", "query_report(infer, variables=['N2O'], evidence={'CO2': 'A: -10.20 to -0.79'},\n", " desc='Probability of N2O increase with CO2 growth under -0.79%:')\n", "query_report(infer, variables=['N2O'],\n", " desc='Probability of N2O increase with no given evidence:')\n", "query_report(infer, variables=['FFEC', 'REC'],\n", " desc='Joint probability of fossil fuel energy consumption and renewable energy consumption:')\n", "query_report(infer, variables=['CO2'], evidence={'FFEC': 'A: -2.72 to -0.40', 'REC': 'C: +11.51 to +23.95'},\n", " desc='Probability of CO2 increase with:\\n - fossil fuel energy consumption growth under -0.40%\\n - renewable energy consumption growth over +11.51%')\n", "query_report(infer, variables=['CH4'], evidence={'FFEC': 'A: -2.72 to -0.40', 'REC': 'C: +11.51 to +23.95'},\n", " desc='Probability of CH4 increase with:\\n - fossil fuel energy consumption growth under -0.40%\\n - renewable energy consumption growth over +11.51%')\n", "query_report(infer, variables=['N2O'], evidence={'FFEC': 'A: -2.72 to -0.40', 'REC': 'C: +11.51 to +23.95'},\n", " desc='Probability of N2O increase with:\\n - fossil fuel energy consumption growth under -0.40%\\n - renewable energy consumption growth over +11.51%')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can get the following **outcomes** from the queries on the modelled Bayesian network:\n", "\n", "- **Energy consumption vs Population, Urbanization, GDP**: the increase of gases diffusion is more clearly proportional to energy consumption growth rather than population, urbanization or GDP per capita growth. This result can be a symptom that some important casual factors of the energy consumption growth strictly correlated with greenhouse gases diffusion, such as industrialization, are not represented into the network.\n", "- N2O vs CH4: N2O growth trends almost resemble the ones of CO2, whereas the CH4 growth seems very likely to decrease regardless of the evidence. This is unlikely to be the real situation, and the probability query of CH4 without given evidence proves it: the lower tier 'A' is much more likely than the others, and that is a consequence of data sparsity. The other 2 tiers of CH4 are underrepresented. That is not the case of N2O, whose tiers are more balanced.\n", "- **Fossil fuel vs Renewable**: finally and most importantly, the output probabilites of the last queries suggest that the reduction of fossil fuel energy consumption and the increase of renewable energy consumption are really effective strategies to minimize greenhouse gases emissions, hence it is really important to use sustainable energies." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "[1] [*U.S. Environmental Protection Agency - Greenhouse Gas Emissions*](https://www.epa.gov/ghgemissions/overview-greenhouse-gases)\n", "\n", "[2] [*Didem Cinar, Gulgun Kayakutlu - Scenario analysis using Bayesian networks: A case study in energy sector*](https://www.sciencedirect.com/science/article/pii/S0950705110000110)\n", "\n", "[3] [*World Bank Open Data*](https://data.worldbank.org/indicator)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 4 }